88 research outputs found

    Fast location of similar code fragments using semantic 'juice'

    Get PDF
    Abstraction of semantics of blocks of a binary is termed as \u2018juice.\u2019Whereas the denotational semantics summarizes the computationperformed by a block, its juice presents a template of the relationshipsestablished by the block. BinJuice is a tool for extracting the\u2018juice\u2019 of a binary. It symbolically interprets individual blocks ofa binary to extract their semantics: the effect of the block on theprogram state. The semantics is generalized to juice by replacingregister names and literal constants by typed, logical variables. Thejuice also maintains algebraic constraints between the numeric variables.Thus, this juice forms a semantic template that is expected tobe identical regardless of code variations due to register renaming,memory address allocation, and constant replacement. The termsin juice can be canonically ordered using a linear order presented.Thus semantically equivalent (rather, similar) code fragments canbe identified by simple structural comparison of their juice, or bycomparing their hashes. While BinJuice cannot find all equivalentconstructs, for that would solve the Halting Problem, it does significantlyimprove the state-of-the-art in both the computational complexityas well as the set of equivalences it can establish. Preliminaryresults show that juice is effective in pairing code variantscreated by post-compile obfuscating transformations

    Function similarity using family context

    Get PDF
    Finding changed and similar functions between a pair of binaries is an important problem in malware attribution and for the identification of new malware capabilities. This paper presents a new technique called Function Similarity using Family Context (FSFC) for this problem. FSFC trains a Support Vector Machine (SVM) model using pairs of similar functions from two program variants. This method improves upon previous research called Cross Version Contextual Function Similarity (CVCFS) e epresenting a function using features extracted not just from the function itself, but also, from other functions with which it has a caller and callee relationship. We present the results of an initial experiment that shows that the use of additional features from the context of a function significantly decreases the false positive rate, obviating the need for a separate pass for cleaning false positives. The more surprising and unexpected finding is that the SVM model produced by FSFC can abstract function similarity features from one pair of program variants to find similar functions in an unrelated pair of program variants. If validated by a larger study, this new property leads to the possibility of creating generic similar function classifiers that can be packaged and distributed in reverse engineering tools such as IDA Pro and Ghidra.This research was performed in the Internet Commerce Security Lab (ICSL), which is a joint venture with research partners Westpac, IBM, and Federation University Australia

    Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of executables

    Get PDF
    We introduce a model for mixed syntactic/semantic approximation of programs based on symbolic finite automata (SFA). The edges of SFA are labeled by predicates whose semantics specifies the denotations that are allowed by the edge. We introduce the notion of abstract symbolic finite automaton (ASFA) where approximation is made by abstract interpretation of symbolic finite automata, acting both at syntactic (predicate) and semantic (denotation) level. We investigate in the details how the syntactic and semantic abstractions of SFA relate to each other and contribute to the determination of the recognized language. Then we introduce a family of transformations for simplifying ASFA. We apply this model to prove properties of commonly used tools for similarity analysis of binary executables. Following the structure of their control flow graphs, disassembled binary executables are represented as (concrete) SFA, where states are program points and predicates represent the (possibly infinite) I/O semantics of each basic block in a constraint form. Known tools for binary code analysis are viewed as specific choices of symbolic and semantic abstractions in our framework, making symbolic finite automata and their abstract interpretations a unifying model for comparing and reasoning about soundness and completeness of analyses of low-level code

    In situ reuse of logically extracted functional components

    Get PDF
    Abstract Programmers often identify functionality within a compiled program that they wish they could reuse in a manner other than that intended by the program's original authors. The traditional approach to reusing pre-existing functionality contained within a binary executable is that of physical extraction; that is, the recreation of the desired functionality in some executable module separate from the program in which it was originally found. Towards overcoming the inherent limitations of physical extraction, we propose in situ reuse of logically extracted functional components. Logical extraction consists of identifying and retaining information about the locations of the elements comprising the functional component within its original program, and in situ reuse is the process of driving the original program to execute the logically extracted functional component in whatever manner the new programmer sees fit

    Identification and characterization of miRNAome in root, stem, leaf and tuber developmental stages of potato (Solanum tuberosum L.) by high-throughput sequencing

    Get PDF
    BACKGROUND: MicroRNAs (miRNAs) are ubiquitous components of endogenous plant transcriptome. miRNAs are small, single-stranded and ~21 nt long RNAs which regulate gene expression at the post-transcriptional level and are known to play essential roles in various aspects of plant development and growth. Previously, a number of miRNAs have been identified in potato through in silico analysis and deep sequencing approach. However, identification of miRNAs through deep sequencing approach was limited to a few tissue types and developmental stages. This study reports the identification and characterization of potato miRNAs in three different vegetative tissues and four stages of tuber development by high throughput sequencing. RESULTS: Small RNA libraries were constructed from leaf, stem, root and four early developmental stages of tuberization and subjected to deep sequencing, followed by bioinformatics analysis. A total of 89 conserved miRNAs (belonging to 33 families), 147 potato-specific miRNAs (with star sequence) and 112 candidate potato-specific miRNAs (without star sequence) were identified. The digital expression profiling based on TPM (Transcripts Per Million) and qRT-PCR analysis of conserved and potato-specific miRNAs revealed that some of the miRNAs showed tissue specific expression (leaf, stem and root) while a few demonstrated tuberization stage-specific expressions. Targets were predicted for identified conserved and potato-specific miRNAs, and predicted targets of four conserved miRNAs, miR160, miR164, miR172 and miR171, which are ARF16 (Auxin Response Factor 16), NAM (NO APICAL MERISTEM), RAP1 (Relative to APETALA2 1) and HAM (HAIRY MERISTEM) respectively, were experimentally validated using 5′ RLM-RACE (RNA ligase mediated rapid amplification of cDNA ends). Gene ontology (GO) analysis for potato-specific miRNAs was also performed to predict their potential biological functions. CONCLUSIONS: We report a comprehensive study of potato miRNAs at genome-wide level by high-throughput sequencing and demonstrate that these miRNAs have tissue and/or developmental stage-specific expression profile. Also, predicted targets of conserved miRNAs were experimentally confirmed for the first time in potato. Our findings indicate the existence of extensive and complex small RNA population in this crop and suggest their important role in pathways involved in diverse biological processes, including tuber development

    Constructing Call Multigraphs Using Dependence Graphs

    No full text
    • …
    corecore